More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

نویسندگان

Axel Schulz

Christian Guckelsberger

Benedikt Schmidt

چکیده

Social media represents a rich source of upto-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We reimplemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

Semantic Abstraction for generalization of tweet classification: An evaluation of incident-related tweets

Social media is a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity to process this information further. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across differ...

متن کامل

Presenting a three-objective model in location-allocation problems using combinational interval full-ranking and maximal covering with backup model

Covering models have found many applications in a wide variety of real-world problems; nevertheless, some assumptions of covering models are not realistic enough. Accordingly, a general approach would not be able to answer the needs of encountering varied aspects of real-world considerations. Assumptions like the unavailability of servers, uncertainty, and evaluating more factors at the same ti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

نویسندگان

چکیده

منابع مشابه

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

Classifier Ensemble Framework: a Diversity Based Approach

Semantic Abstraction for generalization of tweet classification: An evaluation of incident-related tweets

Presenting a three-objective model in location-allocation problems using combinational interval full-ranking and maximal covering with backup model

عنوان ژورنال:

اشتراک گذاری